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(g) Improved video signal quantization for an MPEG like coding environment. 

(57) A quantization parameter for use in encoding a region of an image is developed from a) a 
categorization of the region into one of a predetermined plurality of perceptual noise sensitivity (PNS) 
classes, b) a level of psycho-visual quality that can be achieved for the encoded version of the image, the 
level being selected from among a plurality of predetermined levels, and c) a prestored empirically 
derived model of the relationship between the PNS classes, the psycho-visual quality levels and the 
values of the quantization parameter. PNS indicates the amount of noise that would be tolerable to a 
viewer of the region, i.e., the perceptual sensitivity of the region to noise. Some characteristics on which 
PNS classes may be based are : spatial activity, speed of motion, brightness of the region, importance of 
the region in a particular context, the presence of edges within the region and the texture of the region, 
eg from "flat" to "highly textured". PNS classes that include combinations of the charactenstics of a 
region of the image may also be defined. The PNS classes employed are selected by the implementor 
and may be determined empirically. The psycho-visual quality of an encoded image is the quality, as 
perceived by a viewer, of the version of the image that is reconstructed from the encoded image. It is 
determined from the complexity of the image and the bit-rate available to encode the image. 
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Technical Field 

This invention is related to video image processing and, more particularly, to the adjusting of encoder 
quantization step size so as to regulate the quality of the reconstructed image and the bit rate of the encoded 
image. 

Background of the Invention 

The manner in which a video signal is quantized determines the bit rate of the encoded signal and the 
quality of the image reconstructed from that encoded signal. Perhaps most significant in this context is the 
quantization step size, which is derived from a predetermined mapping of the so-called quantization parame- 
ters and which directly controls the coarseness/fineness of the quantization employed in developing the en- 
coded signal. Therefore, in order to achieve the maximum picture quality for a particular predetermined target 
bit rate, the quantization parameters need to be appropriately selected. 

Prior approaches to selecting the quantization parameters have been statistically based, have required 
computations necessitating that the entire image or portions thereof be processed multiple times, or have em- 
ployed models of the human visual system. These prior approaches are complex, require large amounts of 
memory or introduce large delays. Moreover, such prior solutions typically ignore the nature of non-homoge- 
neous regions of an image, such as edges. Furthermore, none of the prior solutions employs a single quan- 
tization parameter in an effective manner. A single quantization parameter is in fact required however, by the 
video coding syntax of the Motion Picture Expert Group (MPEG), as set forth in the International Standards 
Organization (ISO) standard Committee Draft 11172-2. 

Summary of the Invention 

The invention as defined in claims 1 and 7. 

Brief Description of the Drawing 

In the drawing: ... 
Shown in FIG. 1 , in simplified block diagram form, is an adaptive perceptual quantizer, in accordance with 

the principles of the invention; 

Shown in FIG. 2 is an exemplary macroblock divided into subblocks; 

Shown in FIG. 3 is the same macrobiock shown in FIG. 2, with its subblocks labelled appropriately after 
sorting; 

Shown in FIG. 4 are two arrays containing variance ratios; and 

Shown in FIG. 5 are the three possible types of nonhomogeneous macroblocks and all the allowed com- 
binations of assignments of subblocks therein to statistical activity classes in the EMAC and VMAC sub- 
groups. 

Detailed Description 

Shown in FIG 1. in simplified block diagram form, is adaptive perceptual quantizer 100, employing the 
principles of the invention, for use in a motion compensated predictive/interpolative video encoder. To aid in 
the understanding of the operation of adaptive perceptual quantizer 100, also shown are subtracter 101 and 
discrete cosine transform (DCT) 110-hereinafter referred to as DCT 110--which are part of the motion com- 
pensated predictive/interpolative video encoder (not shown). * M 

Original video signal VIDIN, which is a series of frames containing images, is supplied to subtracter 101 
and perceptual noise sensitivity (PNS) categorizer 105. For interframe differentially encoded frames of signal 
VIDIN-which encompasses both predictively encoded and interpolatively encoded frames-subtracter 101 is 
also supplied with signal PRED which represents the predicted version of the frame that is being supplied as 
signal VIDIN. If the frame of signal VIDIN is being intraframe coded, PRED is a null image, i.e., all zero. Sub- 
tracter 101 subtracts a frame of PRED from the frame in signal VIDIN that the frame of PRED represents, and 
generates a frame of prediction error signal PREDERR. Therefore, for interframe differentially encoded frames, 
signal PREDERR represents images in the picture element, or pel, domain, which, when added to signal PRED 
yield signal VIDIN. If the frame of signal VIDIN is intraframe coded, signal PREDERR will be identical to signal 
VIDIN 

Pels of signals VIDIN and PREDERR are grouped into subblocks which are two-dimensional arrays of pels. 
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Atypical luminance subblock size is 8 x 8 pels. These 8x8 subblocks of pels may also be grouped together 
into macroblocks. An exemplary macroblock includes four contiguous subblocks of luminance, arranged in a 
16x16 array of pels, and all of the chrominance subblocks that are cosited with the luminance subblocks. 
Only the luminance macroblocks and subblocks are employed during processing in the embodiment of the in- 

5 vention described herein-below. 

DCT 110 converts signal PREDERR to the discrete cosine domain and generates transform coefficients 
that are supplied as signal DCTERR to quantizer 120. It is the transform coefficients of signal PREDERR that 
are to be quantized by quantizer 120. DCT 110 operates on the aforementioned subblocks of pels of signal 
PREDERR, e.g., 8x8 pels per subblock, when performing the discrete cosine transform. DCT 110 yields as 

10 an output 8x8 subblocks of transform coefficients. Just as 8 x 8 subblocks of pels may be grouped into mac- 
roblocks, as described above, so too these 8x8 subblocks of transform coefficients may also be grouped to- 
gether into macroblocks of transform coeff icients. 

Quantizer 120 contains memory 125 which stores a base quantizer step size matrix. The base quantizer 
step sizes, which are the elements of base quantizer step size matrix, are arranged such that one base step 

15 size corresponds to, and will be employed for quantizing, one of the transformed coefficients of each subblock 
of signal DCTERR. Thus, base quantizer step size matrix is an 8 x 8 matrix. 

In addition to signal DCTERR, quantizer 120 also receives as an input, from PNS categorizer 105, a quan- 
tization parameter q p . Quantizer 120 uses the base quantizer step size matrix and quantization parameter q p 
to generate signal DCTERRQ, a quantized version of t he transformed error signal DCTERR, which is supplied 

20 as an output. The actual quantizer step size to be employed for quantizing each coefficient of signal DCTERR 
is developed by multiplying the value of q p by a respective element of the base quantizer step size matrix. 

PNS categorizer 105 categorizes each macroblock of signal VIDIN into one of a predetermined plurality 
of perceptual noise sensitivity (PNS) classes. PNS indicates the amount of noise that would be tolerable to a 
viewer of the region, i.e., the perceptual sensitivity of the region to noise. The PNS classes are determined 

25 based upon ranges of values of the visual characteristics that may be found in a region of an image. This is 
because the sensitivity of the human eye to noise varies according to the nature of the visual characteristics 
of the region within which the noise appears. Some characteristics on which PNS classes may be based are: 
spatial activity, speed of motion, continuity of motion, brightness of the region, importance of the region in a 
particular context, the presence of edges within the region and the texture of the region, e.g., from "flat" to 

30 "highly textured". PNS classes that include combinations of the characteristics of a region of the image may 
also be defined. 

One goal of adaptive perceptual quantizer 100 is to adjust the placement of the noise, that results from 
quantization so that as much of the noise as possible appears where it is least visible, while simultaneously 
ensuring that noise-sensitive areas of the image are relatively finely quantized. Therefore, flat and low-detail 

35 areas, where blockiness (i.e., where the subblock boundaries become perceivable,) can occur, just be quan- 
tized relatively finely. However, busy and textured areas, where noise is less visible, can be quantized relatively 
coarsely. In an embodiment that correspond to the MPEG standard, only one quantization parameter q p need 
be developed for each macroblock. 

In accordance with the principles of the invention, PNS categorizer 105 determines a quantization para- 

40 meter q p for each macroblock as a function of both the PNS category into which the macroblock has been 
categorized and a predetermined psycho-visual quality level Q that it is anticipated can be achieved for the 
encoded version of the image in the frame being encoded. The psycho-visual quality level is received as an 
input by PNS categorizer 105 from quality determination unit 1 30. The determination of the psycho-visual qual- 
ity level of the image in the frame will be discussed further below. 

45 In the exemplary embodiment shown, the relationships between Q, q p and the PNS classes are stored in 

q p table 135, within PNS categorizer 105. An exemplary q p table is shown in Table 1. Typically this table will 
have the same number of columns as the number of PNS categories implementor has defined In preferred 
embodiments this table will have 20 columns. Atable having a small number of classes, such as the table shown 
in Table 1, will suff ice to describe the illustrative embodiment and the principles of the invention. 

50 
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TABLE 1: SAMPLE q p TABLE 



Psycho- Visual 
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Table 1 is arranged such that the index of the rows is the psycho-visual quality level Q and the index of 
the columns is the PNS class. Although the Q for a frame may be adjusted to account for mismatch in the 
number of bits produced with respect to the targeted number of bits that eventually results, as will be further 
25 discussed below, it is assumed for clarity of exposition at this point, that Q is constant during the encoding of 
an entire frame. 

The number and types of PNS classes employed in the q p table are usually selected on the basis of ex- 
perimentation with real images. Furthermore, values of q p in the q p table are typically determined experimen- 
tally for the range of qualities and types of images expected for a particular application. The objective of such 
30 experiments would be to determine a q p table wherein values of q p for all of the PNS classes at a particular 
psycho-visual quality level produces approximately equal perceived noise throughout an image encoded for 
that psycho-visual quality level, without regard for the content of the image, e.g., smoothness, speed of motion, 
brightness, etc. 

One such experiment, for determining the values of q p to be placed in the q p table, requires that an encoder 

35 and a decoder be operated back-to-back at a constant psycho-visual quality level without any constraints being 
placed on the channel bit rate. Q=0 is the minimum psycho-visual quality for the application for which the q p 
table is being developed. The q p values for the Q=0 row of Table 1 would initially be derived by trial and error 
so that an image in which the perceived noise as uniform as possible is achieved. Thus, for a representative 
image, the q p values are adjusted, either one at a time or in combination, until a viewer indicates that he per- 

40 ceives that the noise present in the image is distributed uniformly over the entire image. The values of q p for 
each PNS class are recorded in the q p table. This procedure may be repeated over a set of images until q p 
values for each PNS class remain essentially unchanged from image to image and the perceived noise in each 
image of the set is indicated to be distributed uniformly throughout the image. 

As an initial guess for the q p values for the next highest psycho-visual quality level, e.g., for Q=1 in Table 

45 1, each q p value of the preceding psycho-visual quality level is reduced by one (1). This results in an effective 
reduction in the quantization step size and an increase in the psycho-visual quality level of the reconstructed 
image. If some portions of the image, corresponding to particular PNS classes, are perceived to have less 
noise than the remainder of the image, the q p values for those PNS columns are reduced until a uniform dis- 
tribution of the perceived noise is achieved throughout the image. Similarly, if some portions of the image, 

so corresponding to particular PNS classes are perceived to have more noise than the remainder of the image, 
the q p values for those PNS columns are increased until a uniform distribution of the perceived noise is ach- 
ieved throughout the image. Once the perceived noise is uniform over the entire image, or set of images, as 
described above, the new q p values for each PNS class are recorded in the q p table. This procedure is then 
repeated for each row of the q p table until the table is entirely filled. 

55 During encoding, PNS categorizer 105 categorizes each macroblock of signal VIDIN, which is being en- 

coded, into the one of the available PNS classes into which it best fits. Switch 140 is in position 1 , and, in ac- 
cordance with an aspect of the invention, the PNS class into which the macroblock has been categorized is 
supplied to the q p table. The position of switch 140, as well as all other switches discussed below, is under 
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the control of a quantizer controller (not shown). In accordance with the principles of the invention, the inter- 
section of the row having an index value of the Q value anticipated for the frame, which is supplied from quality 
determination unit 1 30 via switch 190, and the column having an index value equal to the PNS class into which 
the macroblock has been categorized is determined. In accordance with an aspect of the invention, the q p 
value at the determined intersection is employed to encode the macroblock, as described above, and to this 
end it is supplied to quantizer 120. 

In one embodiment of adaptive perceptual quantizer 100, the PNS classes are substantially based on the 
spatial activity in a macroblock. This is because the variance of the macroblock is a primary indicator of the 
spatial activity of the macroblock and, therefore, a primary indicator of the PNS class to which a macroblock 
belongs. The variance of a macroblock can be computed by averaging the variances of the four luminance 8 
x 8 subblocks included within the macroblock. The variances of each of the subblocks of a macroblock are 
denoted v a , v b , v c and v d from left to right and top to bottom, respectively. Also, the maximum and minimum 
variances Vmax , v mln from among the subblocks of the macroblock and the average variance v av for the mac- 
roblock are determined. 

The activity level present in a macroblock is analyzed in PNS categorizer 105, by examining the variances 
of the subblocks therein, to determine if the macroblock contains a low detail area, a textured area, or several 
different types of areas that tend to indicate the presence of edges. Macroblocks that contain only low-detail 
or textured areas are referred to as homogeneous macroblocks and can be immediately be categorized into 
a PNS class based on the variance of the macroblock. Macroblocks that do not meet the requirements to be 
declared homogeneous are declared to be non homogeneous. 

Several tests can be performed to determine if a macroblock is homogeneous. A property of a homoge- 
neous macroblock is that the value of the variances of the four subblocks therein are "close" to one-another. 
The first test performed on the macroblock, in this embodiment, is the low-detail test which determines if the 
entire macroblock is one of low detail. To pass the low-detail test and be classified as a low detail macroblock, 
the following two conditions must both be satisfied: 

(i) v av <T 1 , and 

(ii) Vmax < T*2 

where and T 2 are predetermined thresholds for 8-bit pixels having values ranging from 0 to 255. Atypical 
value of threshold is 45. Threshold T 2 is typically four times T v 

If a macroblock does not pass the low-detail test for homogeneity, a texture test is performed on the mac- 
roblock to determine if the macroblock is a homogenous texture macroblock. To pass the texture test, and be 
classified as a homogenous texture macroblock, condition (iii) and either of conditions (iv) or (v) below must 
be satisfied. 

(iii) v m in1. 

(iv) Two of the three ratios — < T 3 and the third ratio < T 4 . 

v sbk 

(v) All three ratios ^ < T 5 . 

v 8bk 

where, v sbk refers to the value of the variance of an individual subblock whose variance is less than v,^. Typ- 
ically, T 3 is 2.5, T 4 is 4.0, and T 5 is 3.25. In the division operations performed above, if the variance of a subblock 
is less than a threshold T 0 , it is set to T 0 . This avoids division by zero as well as enabling more meaningful 
ratio tests. Atypical value forT 0 is 10. It is also worth noting that the thresholds used in these tests can be 
modified so as to control the number of macroblocks that are classified as homogeneous with respect to the 
number of macroblocks that are classified as nonhomogeneous. As thresholds T 3 , T 4 and T 5 are increased, 
the number of macroblocks that will be declared as homogeneous increases proportionately. 

In this embodiment, a homogeneous macroblock is assigned a PNS class substantially on the basis of its 
variance. The potential range of macroblock variance is partitioned into 16 predetermined classes, and one 
PNS class is associated with each interval. The thresholds defining these intervals are given in Table 2. The 
PNS class of a homogeneous macroblock is determined by the interval in which the variance of the macroblock 
lies. The basic 16 PNS classes of this embodiment are designated C 0f ... , C 15 . The PNS thresholds are much 
closer at the low-variance end, where fine resolution is beneficial. 

The PNS thresholds may be found by experimentation. Such an experiment would begin with a large num- 
ber of PNS classes having fairly closely spaced thresholds. The q p values would then be found, as described 
above. Thereafter, adjacent columns of the q p matrix are examined to see if they are substantially the same. 
If they are, the two corresponding PNS classes are merged into one. The merging of adjacent columns con- 
tinues until all the columns are sufficiently different from each other that no further merging can be performed. 
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TABLE 2:PNS CLASSES 



Homogeneous PNS Class 


Variance Range 


C 0 


0-4 


c, 


5-14 


c 2 


15-29 


c 3 


30-54 


c 4 


55-94 


c 5 


95-159 


c« 


160-264 


o, 


265-434 


c 8 


435-709 


c 9 


710-1069 


C10 


1070-1564 


Cn 


1565-2239 


Ci 2 


2240-3159 




3160-4414 


c 14 


4415-6129 


Cl 5 


6130-100,000 



In this embodiment, it has been found that the experimentally derived variance thresholds of Table 2 can 
be described by employing a simple recursive process. L, denotes the variance threshold at the high end of 
the i-th variance interval, where i ranges from 1 to 1 6. A, is defined as equal to Lj - L, . ^.Jhe initial conditions 
are set such that L 0 = 0 and Ao = A-, = 5, the recursion 

Ai = Ai_ ! + a;_ 2 

along with the definition of A, yields all the values of U. 
In the above, A' is defined as follows: 
for i from 2 to 9, A' h2 = 
and for i from 10 to 16, 

AU = [Ai- 2 /10j*5. 

If a macroblock satisfies neither the low-detail test nor the texture test, it is declared to be nonhomoge- 
neous. Nonhomogeneous macroblocks are likely to occur at or near edges of objects in the scene. Perceptual 
noise sensitivity at these locations depends on many factors, including edge sharpness, object size, overall 
brightness, etc. Thus, there are a wide variety of possible PNS classes that can be defined for nonhomoge- 
neous edge macroblocks. However, it has been found experimentally that a simple, but effective, method of 
categorizing such macroblocks into PNS classes is to use as the PNS class for the subblock the PNS class 
from Table 2 that is associated with the variance range into which the minimum subblock variance v mln falls. 
This is because such aq p is good enough to quantize the most noise-sensitive subblock of the macroblock 
so that noise caused by quantization process will be within that allowed by the psycho-visual quality level at 
which the frame is being encoded Furthermore, the remaining subtrfocks of the macroblock being quantized 
will be quantized more finely than is actually necessary, because they are also quantized with the same q p 
value as is employed for the subblock with variance v mln . Such a result is acceptable, however, despite requiring 
more bits to encode these remaining subblocks than might have otherwise been necessary, because there 
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is no increase in the noise perceived within those subblocks beyond that allowed by the psycho-visual quality 
level of the frame. 

As an example of the above-described technique, if a non-homogeneous macroblock included an area of 
low detail including two adjacent subblocks and an area of texture containing two adjacent subblocks, the low 
detail area would have v Bh . because low detail areas have lower variances than textured areas. The macro- 
block is categorized into the PNS class that would have resulted if the entire macroblock was homogenous 
and had four subblocks that were the same as the subblocks in the low detail area, i.e.. as if the macroblock 
variance was v mln . As a result, the q p selected for the entire macroblock is one that is sufficient to quantize 
the low detail area with the necessary fineness for such an area so as to not introduce therein additional per- 
ceivable noise beyond the level of noise acceptable for the psycho-visual quality level at which the image is 
being encoded. This same q p value, however, is also employed for the textured area of the macroblock. Since 
the textured area could have tolerated a larger q p value than the one actually employed, it is simply encoded 
to have less noise than could actually be tolerated with the psycho-visual quality level. Employing the PNS 
class corresponding to v m i„ is therefore a conservative choice. 

The content of a homogeneous macroblock can range from smoothness in a macroblock belonging to one 
of the low-variance PNS classes to fine texture in a macroblock belonging to one of the medium vanance PNS 
classes and ultimately to coarse texture in a macroblock belonging to one of the high-variance PNS classes. 
As described above, the low-variance PNS classes are generally highly sensitive to noise. However, if some 
of these low-variance PNS classes have very low or very high brightness they can be further classified into 
PNS classes that particularly indicate lower noise sensitivity than for a PNS class that has the same level of 
spatial activity but only a moderate brightness. Other factors, such as speed of motion in the scene or contex- 
tual aspects such as foreground/background, can also be taken into account by defining additional PNS class- 
es, some of which may also have variance ranges overlapping with those defined in Table 2. 

In accordance with an aspect of the invention, additional PNS classes of any type would add additional 
columns to Table 1 . For example, in Table 1 . PNS class C3,. corresponds to a low detail, but very high or very 
low brightness PNS class while corresponds to a PNS class having the same spatial activity level as C^ 
but a moderate brightness. The methodology employed to define such classes is similarto the above-described 
method and the additional classes created are added to Table 1 as additional columns. If more than one PNS 
class exists for a particular variance level, such as and C 3b , when a non-homogeneous macroblock is en- 
coded the PNS class that is more sensitive to noise, such as PNS class Ca, should be chosen. As descnbed 
above, this is a further conservative choice because it guarantees that all the subblocks of the macroblock. 
including the most noise sensitive subblock. are encoded to permit no more noise to be perceived than is per- 
mitted by the psycho-visual quality level at which the frame is being encoded. 

Prediction error signal PREDERR is supplied to bits estimator unit 115. In accordance with an aspect of 
the invention, based upon the contents of signal PREDERR each subblock of every macroblock is asstgned 
to a statistical activity class (SAC) so that later estimates of the number of bits required to encode a frame 
versus the psycho-visual quality of the encoded image can be made. One subgroup of statistical activity class- 
es referred to as the variance model activity classification (VMAC) subgroup, includes statistical activity class- 
es'based only on the subblock variances of signal PREDERR. Each SAC in the VMAC subgroup corresponds 
to a range of variances forsignal PREDERR, as shown in Table 3. The similarity of Table 3 for VMAC subgroup 
statistical activity classes to that of Table 2 for PNS classes is noted, but such similarity is merely the result 
of implementational choices. Table 3 is employed by SAC categorizer 170 to perform the SAC classification 
for subblocks that are included within homogeneous macroblocks. 
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TABLE 3:STATISTICAL ACTIVITY 
CLASSES OF THE VMAC SUB- 
GROUP 



SAC Class 


Variance Range 


v 0 
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v 5 
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713-1075 


v 10 


1076-1570 


v„ 


1571-2245 


V,2 


2246-3165 


V, 3 


3166-4420 


Vl4 


4421-6141 


v 15 


6142-100,000 



If there are sharp discontinuities in the PREDERR signal, a further subgroup of statistical activity classes 
can give improved results. Such a subgroup, referred to as an edge model activity classification (EMAC) sub- 
group, is defined further below. Thus, the statistical activity classes are divided into two subgroups, VMAC 
for relatively continuous areas and EMAC for areas containing high discontinuity. 

In this embodiment, areas of high discontinuity are detected using methods similar to the homogene- 
ous/no nhomogeneous segmentation in the above-described PNS classification, except that variances of sub- 
blocks of signal PREDERR are used instead of variances of subblocks of signal VIDIN. Subblocks that occur 
in areas of continuous variances are designated as VMAC, and Table 3 is employed by SAC categorizer 170 
to perform the SAC classification by determining the SAC associated with the variance range into which the 
variance of the subblock falls. For subblocks that occur in areas of discontinuous variance, further processing 
is carried out as described below. Other well known methods for detection of discontinuity/edge areas may 
also be employed. 

For high discontinuity/edge areas, four statistical activity classes belonging to the EMAC subgroup are de- 
fined. The first three EMAC classes, E 0 to E 2 , depend upon the variance of the subblock. Eo denotes a weak 
edge within a subblocks having a variance strength between 150 and 649, E< denotes a normal edge within 
a subblocks having a variance strength between 650 and 1899. E 2 denotes a strong edge within a subblocks 
having variance strength above 1900. The fourth EMAC class E 3 is used for horizontal or vertical edges that 
have a variance difference strength exceeding 650. These statistical activity classes are shown in Table 4 
which is employed by SAC categorizer 170 to perform the SAC classification of subblocks that are in areas 
of high discontinuity/edges, as described below. 
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TABLE 4: STATISTICAL AC- 
TIVITY CLASSES OF THE 
EMAC SUBGROUP 
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15 
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30 
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SAC 


Variance Range 


Eo 


150-649 


E, 


650-1899 


E 2 


>1900 


E 3 





The process employed for subblock classification requires being able to distinguish edges from texture. 
To do this, the four subblock variances are evaluated in relation to their location inside the nonhomogeneous 

macroblock. . _ . 

A subblock can either belong to an SAC in either the VMAC or the EMAC subgroup. Shown in FIG. 2 is 
exemplary macroblock 201 divided into subblocks 204, 206, 208 and 210. Each of subblocks 204, 206, 208 
and 21 0 also has a corresponding PREDERR variance, respectively, v a , v b , v c , and v d . For purposes of providing 
a numerical example, v a = 900. v b = 60, v c = 2000, and v d = 100. The subblock variances are sorted by 
employing pairwise comparisons of the subblock variances. Six such comparisons are performed, three to de- 
termine v max , two for v^, and the last two for v mldh and v mW1 . the high and low middle variance values, respec- 
tively. Shown again in FIG. 3 is macroblock 201. with each of subblocks 204, 206, 208 and 210 labelled ap- 
propriately after sorting, in accordance with the above given numerical example. 

Again, as described above, if the value of any subblock variance is less than the above-described thresh- 
old To. it is set to To. Three variance ratios are computed r 0 = r, = *JJ. and r 2 = These ratios 
are sorted with two pairwise comparisons to determine from among them are r^, r mM , and r mln , respectively. 
Shown in FIG. 4 is array 401 containing variance ratios 403. Also shown is array 401 showing the results of 
the ratio sorting for the above given numerical example. 

As a first step, the subblock having variance v mln is initially assigned to the one of the statistical activity 
classes within the VMAC subgroup determined by its variance and Table 3. Also, the subblock having variance 
Vmax is initially assigned to the one of the statistical activity classes within the EMAC subgroup determined by 
its variance and Table 4. Therefore, in the numerical example, subblock 206 is assigned to SAC V 4 from among 
the VMAC subclasses and subblock 208 is assigned to SAC E 2 from among the EMAC subclasses. 

Thereafter, the two subblocks whose variance ratio is r mln are merged to create a first area. Shown in FIG. 
3 is first area 301 created by merging subblocks 204 and 208, the ratio of whose variances is r mln . If one of 
the subblocks of the first area has variance v max , this area is initially declared an edge area and both subblocks 
therein are assigned to one of the statistical activity classes within the EMAC subgroup. Since subblock 208 
has variance w, first region 301 is declared an edge area and accordingly subblocks 204 and 208 are labelled 
E Initially the particular SAC of the EMAC subgroup into which each is categorized is determined from the 
variance of the subblock and Table 4. However, this initial SAC may be changed, as described below. For the 
numerical example this initial SAC is E 3 - Alternatively, should one of the subblocks of the first area has variance 
v raln , each of the subblocks is assigned to one of the VMAC classes, according to its respective vanance and 
Table 3. Such subblocks would be labelled V. If the area contains subblocks with only variances v mldh and v mW , 
further examination is required to classify the subblocks. 

Subblocks having a variance ratio of r mId are merged into a second area, and both of the subblocks are 
assigned to statistical activity classes of the same subgroup, either the VMAC subgroup or the EMAC sub- 
group to which one of them had previously been assigned. In the numerical example, r 0 = r mW so thatsubblocks 
206 and 210 are included in second area 303. Since subblock 206 was already categorized into an SAC of 
subgroup VMAC, so to is subblock 210, which is also, accordingly, labelled V. For the numerical example, sub- 
block 21 0 is categorized into SAC V 5 . 

If the two subblocks whose variance ratio is r mln were categorized into an SAC of the EMAC subgroup and, 
both subblocks are not of the E, class, r mIn < 2. and the subblocks are {204. 206}. {208. 210}. {204. 208} or 
{206 210} it is determined that a strong horizontal or vertical edge exists. In such a situation, each of the 
subblocks whose variance ratio is r mln is assigned to the E 3 class. In the numerical example, if v a is changed 
from 900 to 1900 all the conditions for a vertical edge would exist in subblocks 204 and 208. Therefore, each 
of subblocks 204 and 208 would be assigned to the E 3 SAC. 
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Shown in FIG. 5 are the three possible types of nonhomogeneous macroblocks and ail the allowed com- 
binations of assignments of subblocks of the macroblocks to statistical activity classes in the EMAC (E) and 
VMAC (V) subgroups. In particular, macroblocks 501, 503, 505, 507, 509, 511, 513 and 515, are candidates 
to have a strong horizontal or vertical edge. This is because each of the enumerated macroblocks contains 

5 adjacent subblocks which have been categorized into statistical activity classes of the EMAC subgroup. 

Returning to FIG. 1, for purposes of determining Q for the current frame, histogram counter (HIST) 145 
must contain an estimate of the number of subblocks in each SAC that are also in a particular PNS class prior 
to start of encoding. These estimates are kept in a histogram table, HIS [PNS] [SAC], contained with HIST 145. 
Thus, the estimates are the table entries in the histogram table while the rows of the histogram table are in- 
to dexed by PNS classes and the columns are indexed by the statistical activity classes. 

In an exemplary embodiment, the histogram values stored in HIST 145 are computed while a previous 
frame is being encoded All the values stored in the histogram table are cleared to zero (0) by HIST 145 prior 
to the starting of the encoding of the previous frame. Switches 140, 155, 160, 165 and 190 are in position 1 
when the previous frame is being encoded. As each macroblock of the previous frame is processed it is cate- 

15 gorized into one of the predetermined PNS classes by PNS categorizer 105 which supplies the PNS class to 
HIST 145 via switch 165. Similarly, for each subblock of each of the macroblocks, SAC categorizer 170 pro- 
duces an SAC which is supplied to HIST 145 via switch 160. At the intersection of each row and column of the 
histogram table is stored the number of subblocks of the previous frame that have already been processed 
and were categorized into both the same particular PNS class and the same particular statistical activity class. 

20 As each subblock is processed, the location in the histogram table that is at the intersection of the row and 
column corresponding to both the SAC and the PNS class into which the subblock has just been categorized 
is incremented. The values that are developed and stored in the histogram table by the end of the encoding 
of the previous frame are then employed as the estimate of the number of subblocks in each SAC that are 
also in a particular PNS class for the current frame. 

25 The selection of the psycho-visual quality level is performed once for each frame prior to the beginning 

of the encoding for that frame. At that time, quality determination unit 1 30 receives as an input a target number 
of bits for the frame to be encoded from the motion compensated predict ive/interpolative video encoder (not 
shown), as well as estimates of the number of bits that would be necessary to encode the frame when each 
possible psycho-visual quality level is used from bits estimator 115. Quality determination unit 130 compares 

30 the estimated number of bits for each of the psycho-visual quality levels against the target number of bits for 
the and and selects the psycho-visual quality level t hat corresponds to the estimate that most closely matches, 
and is less than, the target number of bits. 

To estimate the number of bits that will be generated by encoding the current frame at each psycho-visual 
quality level, bits estimator 115 employs a) the values stored in the histogram table, which are estimates of 

35 the number of subblocks in each statistical activity class (SAC) that are also in a particular PNS class, b) a 
prestored bits table 150 (see Table 5), the values of which indicate the estimated number of bits that will be 
generated if a subblock in a particular statistical activity class is encoded with a particular q p and c) the q p 
values supplied by PNS categorizer 105. To determine such an estimate, switches 140, 155, 160, 165 and 190 
are all placed in position 2. For a particular psycho-visual quality level Q', the estimate of the number of bits 

40 that are required to encode a frame at is given by 

X £ HIS [PNS in<u 1 1 SAC indx ] * BFTS_TABLE [q p _TABLE[Q'] [PNS ] ] [S ACj,^ ] 

PNS^SAC^ 

45 

where PNS, ndx is a variable whose range spans over the entire plurality of PNS classes, SAC lndx is a variable 
whose range spans over the entire plurality of statistical activity classes and HIS[PNS lndx ][SAC lnd J is the cor- 
responding value from the histogram table. Values of PNS indx and SAC^ are supplied by PNS indx 175 and 
SAC indx 1 80, respectively, so as to perform the double sum over all of the PNS classes and all of the statistical 
so activity classes, q^ TABLE[Q'][PNS lndx ] corresponds to the value of q p for psycho-visual quality level Q'. The 
value of Q' spans over the entire range of psycho-visual quality levels. The number of bits is computed once 
for each Q' value. Q' is supplied by Q' indx 185. The timing of the supplying of the values of by PNS indx 175, 
SAC indx 1 80 and Q' indx 1 85 are synchronized by a controller, not shown. Such synchronization is well known 
by those skilled in the art. 

55 Shown in Table 5 is an abridged exemplary bits table 150. The methods for developing such a table will 

be readily apparent to those skilled in the art. 
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TABLE 5: EXAMPLE OF BITS TABLE (FOR SUBBLOCKS) 
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Despite the best efforts to encode a picture within a specif ied number of bits, it can happen that the number 
of bits actually generated exceeds the specification, perhaps greatly so. This is typically the result of the fact 
that the histogram table is only an estimate of the distribution of the subblocks over the PNS classes and the 
statistical activity classes and such estimates can be wrong, especially at scene changes. To prevent the typical 
buffer of the motion compensated predict ive/interpol at ive video encoder (not shown) from overflowing, the full- 
ness of the buffer is monitored by PNS categorizer 1 05 at regular intervals, for instance five frames per frame. 
An indication of the fullness of the encoder buffer is received by PNS categorizer 1 05 as signal buffer fullness. 
Depending on the buffer fullness, the Q being employed as the index into the q p table for the frame being en- 
coded can be adjusted in a progressive and orderly fashion. The nature and direction of these adjustments to 
Q will vary with fullness of the buffer. Decreasing the Q employed results in a more coarse quantization and 
fewer bits produced at the expense of the psycho-visual quality perceived by the view. However, such an or- 
derly reduction in the psycho-visual quality is preferable to permitting the buffer to overflow. 

The foregoing merely illustrates the principles of the invention. Thus, although various components of 
adaptive perceptual quantizer 100 are shown as discrete functional elements their respective functions will 
typically be realized by appropriate program code executing in a processor, in a manner well known in the art. 
It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, al- 
though not explicitly described or shown herein, embody the principles of the invention and are thus within 
its spirit and scope. 

For example, in one such embodiment the quantizer step size manx stored in memory 125 can be down- 
loaded from a storage unit (not shown) once for each frame. Therefore, the manx need not be the same from 
frame to frame. Furthermore, more than one q p table 1 35 may be stored in PNS categorizer 105. Which of the 
stored tables is used for encoding any particular frame may be made dependent upon the type of coding used 
for the particular frame. Thus, intraframe coded frames (I) frames could have a first q p table, motion compen- 
sated predictively (P) coded frames could have a second q p table and motion interpolated (B) frames could 
have a the q p table. In another embodiment signal PREDERR may be used in addition to or instead of VIDE 
in the PNS classification. This is indicated by the dashed line input to PNS categorizer 105. 

In an alternative embodiment, the frame to be encoded may be processed in accordance with a two pass 
process So that two passes may be achieved, the portion of signal VIDIN corresponding to the frame is tem- 
porarily buffered in a memory. During the first pass the actual distribution of the subblocks over the PNS class- 
es and statistical activity classes of the frame to be encoded are determined and stored in the histogram table. 
As a result, the estimated number of bits required to encode the frame for each psycho- visual quality level 
will be the actual number of bits. Therefore, after the best matching psycho-visual quality level is selected by 
quality determination 130, the number of bits produced during the actual encoding of the frame, which is the 
second pass, will be exactly the number that was determined for that psycho-visual quality level dunng the 
first pass. Therefore, no corrections will be necessary. 

In a further alternative embodiment HIST 145 may actually contain more than one histogram table or the 
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histogram table may have more than one region. In such an embodiment one histogram is stored per frame 
type, e.g., I, P and B frames. The histogram that is actually used in computing the number of bits for a particular 
frame is the one stored for the same type of frame. 

Claims 

1 . A method for use in generating quantization parameters to be employed by a video coder when said video 
coder is processing at least a portion of a video signal comprised of frames, the method CHARACTER- 
IZED BY the steps of: 

dividing a particular one of said frames into a plurality of regions; 

categorizing each of the regions into one of a plurality of predetermined perceptual noise sensitivity 
classes; 

selecting a target psycho-visual quality level for the frame from among a plurality of predetermined 
target psycho-visual quality levels; and 

providing a quantization parameter for each of said regions as a function of the perceptual noise 
sensitivity class of each of said regions and said psycho-visual target quality level. 

2. The invention as described in claim 1 CHARACTERIZED IN THAT said step of categorizing is CHARAC- 
TERIZED BY the steps of: 

determining a perceptual noise sensitivity level for each of the regions; and 
mapping each perceptual noise sensitivity level into a corresponding one of a plurality of predeter- 
mined perceptual noise sensitivity classes. 

3. The invention as described in claim 1 CHARACTERIZED IN THAT said step of selecting employs a pre- 
determined function that relates an the target psycho-visual quality level to an estimated complexity of 
the video signal of the frame and a number of bits specified for encoding the frame. 

4. The invention as described in claim 2 CHARACTERIZED IN THAT regions are macroblocks and said step 
of determining determines a perceptual noise sensitivity level for each macroblock thereby characterizing 
each macroblock according to the amount of noise that can be added to the macroblock with respect to 
the disturbing effect that the adding of such noise will have on a viewer perceiving said macroblock with 
said added noise. 

5. The invention as described in claim 1 CHARACTERIZED IN THAT said regions are of two types, a f irst 
type of region being a macroblock and a second type of region being a subblock, a plurality of subblocks 
being grouped together to form a macroblock, said step of selecting being CHARACTERIZED BY the 
steps of: 

categorizing each of said subblocks into one of a plurality of statistical activity classes; 

developing estimates of the number of bits required to encode the frame at each psycho- visual 
quality level of said plurality; 

comparing said estimates to a predetermined target number of bits that are available to encode 
said frame; and 

picking the psycho-visual quality level having an estimate that is closest to but does not exceed 
said target number of bits. 

6. The invention as described in claim 5 CHARACTERIZED IN THAT each estimate of said step of developing 
is given for any particular psycho- visual quality level Q'by: 

£ £ HISIPNSindJISACindJ'Bro 

PNS^SACtafc 

where PNS indx is a variable whose range spans over the entire plurality of perceptual noise sensitivity 
classes, SAC, ndx is a variable whose range spans over the entire plurality of statistical activity classes, 
q^ TABLE[Q'] [PNS lnd J corresponds to the value of a quantization parameter for use in encoding mac- 
roblocks belonging to perceptual noise sensitivity class PNS indx that must be encoded to achieve psycho- 
visual quality level Q'and HIS[PNS lnd J[SAC, nd J is an estimate of the number of subblocks in statistical 
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activity class SAC indx that are also in a perceptual noise sensitivity class PNS lndx . 

7. Apparatus for use in the quantization of at least a portion of a video signal comprised of frames, CHAR- 
ACTERIZED BY: 

first means for categorizing [105] first regions into which said frame is divided into one of a plurality 
of predetermined perceptual noise sensitivity classes; 

means for selecting a target psycho-visual quality level [ 130, 115] for encoding said frame from a 
plurality of predetermined psycho-visual quality levels; and 

means for determining a quantization parameter [120] for each of said regions, said means for de- 
termining being responsive to the perceptual noise sensitivity class into which each of said regions is cat- 
egorized by said means for categorizing and the target psycho-visual quality level selected for said frame 
by said means for selecting. 

8. The apparatus as defined in claim 7 CHARACTERIZED IN THAT said means for selecting is CHARAC- 
TERIZED BY: 

means for generating estimates of the number of bits required to encode said frame with each of 
said predetermined plurality of psycho-visual quality levels[115]; 

means for receiving a predetermined target number of bits that are available to encode said frame; 

and 

means for picking the psycho-visual quality level that has an estimate that is closest to said target 
number of bits[ 130]. 

9. The apparatus as defined in claim 7 CHARACTERIZED IN THAT said means for selecting is CHARAC- 
TERIZED BY: 

means for generating estimates of the number of bits required to encode said frame with each of 
said predetermined plurality of psycho-visual quality levels[115]; 

means for receiving a predetermined target number of bits that are available to encode said frame; 

and 

means for picking the psycho-visual quality level that has an estimate that is closest to, but does 
not exceed said target number of bits[130]. 



10. The apparatus as defined in claim 8 or 9 CHARACTERIZED IN THAT said means for generating estimates 
is CHARACTERIZED BY: 

second means for categorizing [180] second regions into which said frame is divided into one of a 
35 plurality of predetermined statistical activity classes; and 

means, responsive to said first and second means for categorizing, for determining the number of 
bits that will be generated [150] by the encoding of one of the second regions that is included within one 
of the first regions such that a particular psycho-visual quality level is achieved. 

40 11. The apparatus as defined in claim 10 CHARACTERIZED BY said first regions are macroblocks and said 
second regions are subbiocks. 

12. The apparatus as defined in claim 7 further CHARACTERIZED BY: 

means for supplying said quantization parameter as an output [155]; 
45 means for quantizing[120], responsive to said quantization parameter, for quantizing a portion of 

an encoded version of said video signal. 
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A quantization parameter for use in encoding 
a region of an image is developed from a) a 
categorization of the region into one of a pre- 
determined plurality of perceptual noise sensiti- 
vity (PNS) classes, b) a level of psycho-visual 
quality that can be achieved for the encoded 
version of the image, the level being selected 
from among a plurality of predetermined levels, 
and c) a prestored empirically derived model of 
the relationship between the PNS classes, the 
psycho-visual quality levels and the values of 
the quantization parameter. PNS indicates the 
amount of noise that would be tolerable to a 
viewer of the region, i.e., the perceptual sensiti- 
vity of the region to noise. Some characteristics 
on which PNS classes may be based are : spa- 
tial activity, speed of motion, brightness of the 
region, importance of the region in a particular 
context, the presence of edges within the reg- 
ion and the texture of the region, e.g., from 
"flat" to "highly textured". PNS classes that 
include combinations of the characteristics of a 
region of the image may also be defined. The 
PNS classes employed are selected by the im- 
plementor and may be determined empirically. 
The psycho-visual quality of an encoded Image 
is the quality, as perceived by a viewer, of the 
version of the image that is reconstructed from 
the encoded image. It is determined from the 
complexity of the image and the bit-rate avail- 
able to encode the image. 
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